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Abstract 

Let  BV  =  B \(lRd)  be  the  space  of  functions  of  bounded  variation  on  lRd  with 
d  >  2.  Let  A  6  A,  be  a  wavelet  basis  of  compactly  supported  functions  nor¬ 
malized  in  BV,  i.e.  |bAlBV(iRd)  =  1,  A  €  A.  Each  /  £  BV  has  a  unique  wavelet 

expansion  EbeA  cA(/)bA  with  convergence  in  L\(lRd).  If  Ajv(/)  is  the  set  of  N 
indicies  A  £  A  for  which  |c.\(/)|  are  largest  (with  ties  handled  in  an  arbitrary  way), 
then  gN(f)  :=  EasA  N(f)  is  called  a  greedy  approximation  to  /.  It  is  shown 

that  |Gv(/)lBV(Kd)  —  G/lBV(iRd)  with  C  a  constant  independent  of  /.  This  answers 
in  the  affirmative  a  conjecture  of  Meyer  [15]  (see  p.  79). 

AMS  subject  classification:  42C40,  46B70,  26B35,  42B25. 

Key  Words:  N- term  approximation,  greedy  approximation,  functions  of  bounded 
variation,  thresholding,  bounded  projections. 


1  Introduction 

The  space  BV  :=  BV(P)  of  functions  of  bounded  variation  on  a  domain  Q  C  lRd  is  impor¬ 
tant  in  mathematics  (geometric  measure  theory,  differential  geometry)  and  applications 
(image  processing,  nonlinear  PDEs).  The  structure  of  BV  is  complicated  by  the  fact  that 
neither  it  nor  the  closely  related  Sobolev  space  W1(L i(P))  have  an  unconditional  basis 
(BV  does  not  even  have  a  basis).  Wavelet  decompositions  of  BV  functions,  while  not 
characterizing  this  space,  give  fine  information  (see  [4,  19,  2])  about  its  structure  and 
these  decompositions  can  be  used  to  solve  various  extremal  problems. 

Consider,  for  example,  the  extremal  problem 

K(f,t)  :=  K(f,t-,L2({l),BV((l))  :=  inf  ||/ -  j||Mn,  +  f|j|Bv(n).  (1.1) 

g£  BV(f2) 

where  fl  =  [0,  l]2  and  t  >  0  is  a  parameter.  The  expression  (1.1)  is  called  a  K-functional  in 
interpolation  of  linear  operators.  It  is  used  to  describe  interpolation  spaces  between  L2(B) 

*This  work  has  been  supported  in  part  by  the  NRC  New  Investigators  Twinning  Program  2003-2004 
as  well  as  the  Office  of  Naval  Research  Contract  N00014-03-1-0051,  the  Air  Force  of  Scientific  Research 
Contracts  UFEIES0302005USC  and  DAAD  19-02-1-0028,  the  Foundation  for  Polish  Science  and  KBN 
grant  5P03A  03620  located  at  the  Institute  of  Mathematics  of  the  Polish  Academy  of  Sciences. 
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and  BV(fi).  This  and  related  functionals  also  occur  in  image  processing  in  such  problems 
as  denoising  and  deblurring.  The  rate  of  decay  of  K(f,t)  as  t  — »  0  gives  information 
about  the  smoothness  of  /  relative  to  L2(H)  and  BV(f2).  A  function  g  —  gt  is  a  called  a 
near  minimizer  (with  constant  C )  to  (1.1)  if 

11/  -  9t\\L2<Si)  +  /^|bv(Q)  <  CK(f,t). 

One  would  like  simple  constructive  methods  for  finding  minimizers  or  near  minimizers  to 

(ii)- 

In  [4],  it  is  shown  that  thresholding  the  Haar  decomposition  of  /  provides  a  near 
minimizer  to  (1.1).  Namely,  if  Hx,  A  G  A,  is  the  Haar  basis  on  [0,  l]2,  then  given  /  G 
L2([0,  l]2),  we  can  write 

/  =  E  cx(f)Hx 

AeA 

with  the  Hx  normalized  in  L2([0,  l]2)  (which  is  equivalent  to  normalizing  in  BV([0,  l]2)). 
For  each  t  >  0,  a  near  minimizer  gt  is  given  by  thresholding  the  Haar  series: 

gt:=Tt2f-.=  £  cx(f)HXi 

AeA(/,t2) 


where  for  any  t  >  0 


A {f,t)  :=  {A  :  \cx(f)\  >  t}. 

The  proof  that  thresholding  is  a  near  minimizer  relies  on  three  basic  results  concerning 
Haar  decompositions  and  BV.  To  describe  these,  we  introduce  the  concept  of  TV-term 
approximation  using  the  Haar  basis.  We  define  as  the  collection  of  all  functions 
S  =  J2\eAc\Hx,  where  A  C  A  is  any  index  set  with  cardinality  #(A)  <  N.  Given 
/  G  L2([0,  l]2),  we  consider  the  approximation  of  /  using  the  elements  of  E^: 

<Fv(/)l2([ o,i]2)  :=  int  \\f  -  5'|U2([o,i]2) - 

*e2j  N 

The  first  of  these  basic  results  is  the  following  direct  estimate  (see  [4])  for  the  approx¬ 
imation  error: 


M/)l2([0,i]2)  <  C0N  1'  |/|bv([o,i]2)>  A  —1,2, - 

This  inequality  is  called  an  inequality  of  Jackson  type  (corresponding  to  analogous  in¬ 
equalities  in  approximation  by  algebraic  polynomials).  The  Jackson  inequality  is  proved 
by  showing  that  the  Haar  coefficients  of  a  BV([0,  l]2)  function  are  in  weak  i\.  That  is 

#(A(/,e))  AGoe-1,  e  >  0. 

This  weak  A  property  was  shown  in  [6]  to  hold  in  the  more  general  setting  of  wavelet 
expansions  of  functions  in  BV(TRd)  using  compactly  supported  orthogonal  wavelets.  This 
allows  the  generalization  of  the  Jackson  inequality  to  arbitrary  space  dimensions  and 
arbitrary  compactly  supported  orthogonal  wavelet  systems  (see  Lemma  4.2). 
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The  second  basic  result  (see  [4])  is  the  Bernstein  inequality  which  (in  the  case  of  [0,  l]2) 
says  that 

|S|bv([0,i]2)  <  C'0iV1/2||>s||L2([0il]2),  SeY%,  N  =  l,2,.... 

We  shall  show  in  §5  that  this  inequality  also  generalizes  to  IRd  and  general  compactly 
supported  orthogonal  wavelet  systems. 

The  Jackson  and  Bernstein  inequalities  are  not  enough  to  show  that  thresholding  the 
Haar  expansion  is  an  approximate  minimizcr  for  (1.1).  One  also  needs  the  stability  of 
thresholding  in  BV([0, 1] 2) : 

I -^e(/) |bv([o,i]2)  <  Co|/|bv([o,i]2),  /  £  BV([0,  l]2). 

This  remarkable  property  says  that  projecting  onto  any  sum  involving  the  N  largest 
wavelet  coefficients  of  the  Haar  series  of  a  function  in  BY ( [0,  l]2)  results  in  a  function 
with  controllable  BV([0,  l]2)  norm.  Note  that  this  property  does  not  hold  for  projecting 
onto  an  arbitrary  TV-term  sum  of  the  Haar  series  nor  does  it  hold  in  1R1  (see  §7).  This 
stability  result  for  Haar  expansions  was  generalized  to  space  dimensions  d  >  2  in  [19]. 
Yves  Meyer  [15]  (see  p.  79)  has  conjectured  that  this  property  holds  for  any  compactly 
supported  wavelet  system.  The  main  result  of  this  paper  is  to  prove  this  conjecture. 

Theorem  1.1  Let  p  be  a  compactly  supported  univariate  scaling  function  in  BV(1?1) 
which  generates  the  compactly  supported  orthogonal  wavelet  if.  For  d  >  2,  we  consider  the 
multivariate  orthogonal  wavelet  system  (Vh)  aga  obtained  from  p  and  ip,  and  normalized 
in  BV(TRd).  Then  this  wavelet  system  has  the  following  BV  stability  property.  If  f  E 
B V (lRd) ,  d  >  2,  let 

f  =  Y  ca(/)^a 

Ae  A 

be  the  wavelet  expansion  of  f .  Let  for  any  N,  A N(f)  be  the  set  of  N  indices  A  E  A  for 
which  |ca(/)|  are  largest.  Then  the  nonlinear  operator 

GnU)  ■■=  Y  ca(/)^a 

AeAj v(/) 

satisfies 

l^v(/)lBV(JRd)  —  d)\f\By(]Rdy 

As  a  consequence  of  this  theorem  we  shall  also  show  that  f?v(/)  realizes  the  K- 
functional  for  the  pair  (Ld*(IRd),BV(IRd)). 

Theorem  1.2  Let  p  be  a  compactly  supported  univariate  scaling  function  in  BV(J?1) 
which  generates  the  compactly  supported  orthogonal  wavelet  ip.  Ford  >  2,  we  consider  the 
multivariate  orthogonal  wavelet  system  (ip\)\eA  obtained  from  p  and  ip,  and  normalized 
in  B V(IRd).  Then  the  greedy  operator 

QnU)  '■=  Y,  Ca(/)^A, 

AeAj v(/) 

with  A tv(/)  the  set  of  N  indices  A  G  A  for  which  |ca(/)|  are  largest,  satisfies 

II/-6jv(/)IIl,(r-,+^1/‘‘|Sjv(/)Ibv(^)  <  (1.2) 
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2  The  space  BV 

There  are  several  treatments  of  the  space  BV.  We  mention  two  valuable  references  [15,  20] 
which  contain  all  of  the  properties  of  BV  functions  that  we  shall  need.  There  are  several 
equivalent  definitions  of  BV.  The  approach  we  take  below  is  simply  the  most  direct  and 
convenient  for  our  setting. 

Let  Ll  be  an  open  set  in  Md.  We  begin  with  the  Sobolev  space  Wl(Li{Ll))  which  is 
the  collection  of  all  functions  in  LfiLl)  such  that  the  distributional  gradient  V/  is  also  in 
L The  semi-norm  on  this  space  is 

l/VqLqn))  :=  ||V/||Ll(ft), 

and  the  norm  for  this  space  is  obtained  by  adding  the  Li(Ll)  norm: 

ll/llwhLqn))  :=  l/WcLqo))  +  ||/I|li(q)- 

The  space  BV(fl)  can  now  be  defined  as  the  set  of  all  /  G  L\(Ll)  for  which  there  is  a 
sequence  (fn)  satisfying 

||/  -  /nlk(n)  -»•  0,  sup  \fn\wHLim  <  °0-  (2.1) 

n 

The  semi-norm  on  BV  is  then  defined  as 

(2.2) 

where  the  infimum  is  taken  over  all  sequences  satisfying  (2.1).  To  see  that  this  definition 
is  equivalent  to  other  definitions  of  BV  the  reader  should  consult  Theorems  5.2.1  and 
5.3.3  in  [20], 

We  mention  a  couple  of  properties  of  the  BV  norm  that  we  shall  use  in  this  paper. 

Remark  2.1  In  the  case  Ll  =  IRd ,  the  functions  fn  appearing  in  (2.1)  and  (2.2)  can  be 
taken  to  be  in  Coc(IRd )  with  compact  support. 

Remark  2.2  Let  Io  be  a  dyadic  cube  in  IRd  and  Ij,  j  —  lr. .  .,m,  be  a  finite  collection 
of  disjoint  dyadic  cubes  each  of  which  is  contained  in  Iq.  Let  xik  be  the  characteristic 
function  of  Ik,  k  =  0, . . . ,  m.  Then  the  function  f  =  xi0  —  Xq=i  X.r:,  has  BV  semi-norm 

m 

l/lBV(JRh  <  J2measd-i(dlj),  (2.3) 

3=0 

where  dfl  denotes  the  boundary  of  a  set  LI  C.  IRd  and  meas^-i  is  the  (d  —  1)  dimensional 
surface  measure. 

The  second  result  can  be  proved  directly  or  derived  from  the  well  known  co-area  formula 
for  BV  functions  (see  [20],  p.  231).  We  have  equality  in  (2.3)  if  the  boundaries  of  the  Ij, 
j  =  0,1,...,  m,  are  disjoint. 

Remark  2.3  If  Llj  C  IRd,  j  —  1, . . .  ,m,  is  a  partition  of  LI,  then 

m 

I/IbV(OP  <  | / 1 BV(f2) -  (2.4) 

3= 1 

This  follows  from  the  set  additivity  of  the  L\  semi-norm  in  the  case  /  G  kL1(L1(12))  and 
by  taking  limits  in  the  general  case  /  G  BV(fl). 
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3  Wavelet  decompositions 

We  shall  limit  our  analysis  to  the  case  of  compactly  supported  orthogonal  wavelets  on 
lRd .  The  results  we  put  forward  in  this  paper  hold  equally  well  for  biorthogonal  compactly 
supported  wavelets  with  the  same  proofs  but  somewhat  more  cumbersome  notation. 

Let  <p  be  a  compactly  supported  univariate  scaling  function  with  orthogonal  shifts 
which  satisfies  the  two  scale  relation 

<p(x)  =J2ak(f(2x  ~ 

k 

where  only  a  finite  number  of  the  ak  are  nonzero.  We  shall  assume  throughout  this  paper 
that  <p  is  in  BV(iR1).  Let  ^  be  the  univariate  wavelet  function  with  compact  support 
which  is  obtained  from  ip  by  multiresolution.  Examples  of  such  wavelets  and  scaling 
functions  were  given  by  Daubechies  [8]. 

We  use  the  standard  construction  of  multidimensional  wavelet  bases.  Let  E'  denote 
the  set  of  vertices  of  the  cube  [0,  l]rf  and  E  denote  the  set  of  nonzero  vertices.  We  shall 
use  the  notation  -0°  :=  ip  and  ijj1  :=  % b.  For  each  e  G  E' ,  we  define 

V(xu  ...,xd)  :=  ^ei(xi)  •  ■■^ed{xd). 

Let  V  denote  the  set  of  dyadic  cubes  in  lRd  and  let  Vk  denote  those  dyadic  cubes  which 
have  sidelength  2~k  and  V+  :=  Uk>0Vk.  For  any  dyadic  cube  /  =  2 ~k(j  +  [0,  l]d)  G  Vk , 
k  G  /% ,  j  G  we  define  the  functions 

Vi(x)  ■=  7 (I,  e)^e(2kx  —  j),e  G  E' , 

with  the  7 (/,  e)  >  0  chosen  so  that 

l^/lBV(jRd)  =  1?  /  G  T>,  e  G  E  . 

These  functions  are  scaled  to  I.  ft  follows  that  the  constants  7 (/,  e)  =  |/|_1//d*7(e)1  with 
d*  ’■=  -r~r  and  therefore  we  have 

a—  1 

Cl  <  \H>j\\Ld*(iRd)  -  c2)  I  G  X>,  e  G  E\ 

with  constants  Ci,C2  depending  only  on  ip  and  d.  In  other  words,  normalization  in  BV  is 
equivalent  to  normalization  in  Ld*. 

To  simplify  the  notation  that  follows,  we  introduce  the  indexing  set  A  which  consists 
of  all  pairs  A  =  (/,  e)  with  I  G  V+  and  e  G  E  (e  G  E'  if  I  G  V 0).  We  define  |A|  :=  k 
when  /  G  T>k.  The  set  of  functions  {^aIasA  is  a  complete  orthogonal  system.  Any  locally 
integrable  function  /  on  lRd  has  a  formal  wavelet  series 

f  =  ca(/)^a, 

aga 

where  the  wavelet  coefficients  c\(f)  are  given  by 
c\(f)  :=  c$(/)  :=  </(■),  T'(e,  /)V>'( 2‘  •  -])),  A  —(/,<?)€  A,  I  -  2~\j  +  [0,  if), 

1Througout  this  paper,  we  shall  use  the  notation  |T|  to  denote  the  Lebesgue  measure  of  a  set  A  C  lRd. 
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where  the  normalization  factors  7;(e,  I)  of  the  dual  wavelet  scale  are  like  7 '(/,  e)  ~  |J|~1//d. 

The  set  of  functions  {0a}asa  is  a  basis  for  many  function  spaces.  For  example,  they 
are  an  orthogonal  basis  for  L2(lRd).  They  are  an  unconditional  basis  for  the  Lp  spaces 
1  <  p  <  00  and  for  the  Besov  spaces  whenever  they  admit  an  unconditional  basis. 
They  are  a  basis  for  W1(L i(fl)),  but  not  unconditional  (this  space  does  not  admit  an 
unconditional  basis). 

We  shall  use  the  abbreviated  notation  0  :=  0(°’---’°)  for  the  function  which  is  a  tensor 
product  of  scaling  functions.  Similarly,  we  write 

<h(x)  ■=  -  j ),  I  =  2-\j  +  [0,  l]d), 

to  index  the  scaling  functions  at  level  k.  The  shift  invariant  space  S k  :=  Sk(4>)  is  the  span 
of  the  functions  0/,  /  G  Vk.  Each  space  Sk  is  a  dilate  of  the  space  S0-  At  each  dyadic 
level  k ,  the  shifts  0/,  /  G  Vk  sum  to  a  constant: 

53  0/  =  c|i|-*  (3.1) 

IeVk 

with  c  a  constant.  Any  wavelet  iftx  or  scaling  function  at  a  dyadic  level  j  <  k  (i.e.  |A|  =  j ) 
is  an  element  in  Sk  and  can  be  written  as  a  finite  linear  combination  of  the  0/,  /  G  T>k. 

4  Approximation  by  piecewise  constants 

We  shall  use  in  the  course  of  our  proofs  some  results  on  approximation  of  BV  functions 
by  piecewise  constant  functions.  Throughout  this  and  the  next  section,  we  assume  that 
d  >  2.  The  results  we  shall  need  are  for  the  most  part  proved  in  two  earlier  works  [4]  (for 
the  case  d  —  2)  and  [19]  (for  the  case  d  >  2). 

We  shall  discuss  three  types  of  approximation  by  piecewise  constants.  The  first  of  these 
is  iV-term  approximation  using  Haar  functions.  In  this  case,  we  can  be  more  general  and 
treat  W-term  approximation  using  compactly  supported  wavelets.  So  let  (0a)asa  be  one 
of  the  wavelet  bases  introduced  in  the  previous  section.  We  take  the  basis  functions  0a 
to  be  normalized  in  BV. 

We  define  the  nonlinear  space 


AeA 

Thus,  each  element  in  E’0  is  a  linear  combination  of  at  most  N  wavelets  which  can  occur 
at  arbitrary  positions  or  scales. 

We  define  the  error  in  approximating  /  G  Lp(Md )  by  the  elements  of  by 

aiv(/)Lp(JRd)  :=  inf,  11/  ~  *S'llLp(Kd)-  (4-1) 

A  fundamental  result  in  wavelet  approximation  [17]  is  that  the  approximation  error 
can  be  obtained  up  to  a  constant  C(d,ip)  by  greedy  approximation.  We 
describe  this  result  only  in  the  case  p  —  d*  although  it  holds  for  all  1  <  p  <  00  when 
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one  uses  wavelets  normalized  in  Lp(lRd).  For  each  N  =  1,2,...,  we  define  the  greedy 
approximant 


QnU)  '=  H  c\(f)*P a, 

AeAj v(/) 

where  A  N(f)  is  the  set  of  the  N  indices  of  the  largest  coefficients  c\(f),  A  G  A,  in  absolute 
value  (ties  in  the  size  of  these  coefficients  can  be  handled  in  an  arbitrary  way).  Then,  we 
have 

Proposition  4.1  For  any  f  G  Ld*{lRd ),  we  have 

11/  -  QnU) Wlmm*)  <  C(d^)aN(frLMMdy  (4.2) 

Proof:  This  result  can  be  derived  easily  from  a  result  of  [17]  where  it  is  shown  that 
(4.2)  holds  when  GnU)  is  replaced  by  Gn1*  (/).  Here,  Q^d *  (/)  is  defined  as  above  except 
that  one  starts  with  the  wavelet  coefficients  normalized  in  Ld*  instead  of  BV. 

Let  {t/t\}Ae a  be  as  usual  the  wavelet  basis  normalized  for  BV:  |^a|bv  =  1-  For  each 
A  G  A,  we  choose  £a  such  that  H^aI/a llz.d* (iRd)  =  T  The  equivalence  of  the  BV  and 
Ld *  normalizations  gives  that  c\  <  Ht/AH Ld,(Md)  A  c2>  with  ci,C2  >  0  independent  of  A. 
Because  of  the  unconditionality  of  the  wavelet  basis  for  Ld*(lRd),  there  are  C\,C2  >  0 
such  that  for  any  sequence  of  coefficients  {oa}agA) 

Cl  1 1  X!  ®Af/l\  ||  (Md)  A  II  ^2  axfx'Gx\\Ld*(IRd)  A  C2||  Y2  aX'lPx\\ Ld*(]Rd)-  (4-3) 

AeA  agA  AgA 

Given  any  function  /  =  EagaCa(/)^a  in  Ld*(Rd),  we  let  g  :=  EagA  Ca(/)6a/a  which 
by  (4.3)  is  also  in  Ld*(Md).  If  GnU)  =  EagaA(/)^a  then  Gn1*  U)  =  Eaga  ca(/)6i0a- 
Hence,  using  (4.3),  we  have 

Cl||/  —  f?Af(/)  ||Ld*(jRb  A  \\g  —  Gn  (9) \\Ld*(Md)  A  C(d,  F)aN(g) Ld*{Md)-  (4-4) 

On  the  other  hand,  if  S'  =  EagA  QaVa  is  a  best  A- term  approximation  to  /  in  Ld*  ( lRd ) 
then,  using  (4.3)  again,  we  have 

anU)  Ld*{lRd)  A  \\g  —  O'Ux'GxW  Ld*(lRd)  —  ^\\f  ~  ^\\Ld*(Md)  =  C'2(^nU)  Ld*{Md)'  (4-5) 

aga 

The  estimates  (4.4)  and  (4.5)  combine  to  prove  the  proposition.  □ 

We  are  interested  in  quantitative  estimates  for  the  approximation  error  o‘v(/)Ld,(Kd) 
whenever  /  G  B V(lRd).  This  will  be  provided  by  the  following  lemma. 

Lemma  4.2  For  any  function  f  G  BV(lRd)  we  have  the  estimate 

aN(f) Ld*(Kd)  A  C(d,ip)N  1C|/|BV(Kd),  N  =  1,2,.... 
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Proof:  The  set  A^0(Lp(Md))  of  functions  /  G  Lp(lRd )  which  satisfy 


<U)lp^)  <  CN~a  (4.6) 

is  called  an  approximation  space.  The  norm  H/H^  ( Lp(Md ))  in  this  space  is  the  smallest 
C  >  0  for  which  (4.6)  is  valid.  For  1  <  p  <  oo,  and  a  >  0,  it  was  proved  in  [5] 
that  /  G  A^0(Lp(lRd))  if  and  only  if  the  sequence  (|| (ca(/)'0a||l  ,(Kd))AeA  is  in  the  space 
weak  £t  (denoted  by  wiT )  with  A  =  a  +  =  Moreover,  H/H^  is  equivalent  to 

11(11  (ca  (/)"0a  II  z,p(iRd) ) AeA.  II^t-  -  In  the  case  of  interest  to  us,  we  have  p  =  d*  =  -Aj  and 
a  =  1/d  so  that  r  =  1.  It  was  shown  in  [4]  (for  the  case  of  Haar  wavelets)  and  in  [6] 
(for  general  wavelets)  that  the  wavelet  coefficients  of  a  BV  function  /  are  in  weak  t\  and 
satisfy 


IKII(caWa)  AS  A  Wl^*  (lRd)S)  \  \w£i  <c{<p,d)\f\  BV(JRd)  • 

Therefore,  the  lemma  follows.  □ 

In  the  case  of  <p  =  X[o,i] ,  the  wavelets  i/x,  A  G  A,  are  the  Haar  wavelets  and  the 
elements  in  are  piecewise  constant  functions  which  take  at  most  CN  values.  We  shall 
now  consider  two  other  types  of  nonlinear  approximation  using  piecewise  constants  which 
will  be  important  for  us  later.  For  the  first  of  these,  let 

#(A)  <  N}, 

ie  A 

where  A  C  V  is  a  set  of  dyadic  cubes  and  for  each  set  S  in  Md,  xs  denotes  the  characteristic 
function  of  S.  Note  that  we  do  not  require  that  the  cubes  in  A  are  disjoint.  In  analogy 
with  (4.1),  we  define 


<Tv(/)Lp(JRd)  11/  II Lp(JRd)  • 

Since  each  Haar  wavelet  H\  is  a  linear  combination  of  at  most  2d  characteristic  func¬ 
tions  of  dyadic  cubes,  it  follows  that  Ejy  C  E c2dN  and  hence  from  Lemma  4.2,  we  have 

aN(f)L*{lRd)  —  C{d,  (p)N  1^|/|BV(JRd))  A  =  1,2,.... 

Lastly,  we  shall  consider  approximation  by  dyadic  rings.  If  /  and  J  C  /  are  two  distinct 
dyadic  cubes  (J  maybe  the  empty  set),  then  we  define  the  dyadic  ring  R  =  R(I,  J )  to  be 
the  set  R,  —  I  \  J.  Consider  the  nonlinear  space 

#(P)  <  N}, 

R&V 

where  V  is  a  family  of  disjoint  rings  (i.e.  any  two  R  in  V  are  disjoint).  Note  that 
Xr  —  Xi  ~  Xj-,  and  therefore 


In  analogy  with  the  approximation  errors  defined  above  for  TV-term  approximation  by 
wavelets  and  constants,  we  define 

°v(/) Lp{Md)  :=  Jj™-  11/  —  S\\Lp{lRd)- 

The  following  lemma  concerning  approximation  by  the  elements  of  T,rN  was  proved  in 
[4]  for  the  case  d  =  2  and  by  [19]  for  the  case  of  general  d  in  [19]  (see  Proposition  18). 

Lemma  4.3  For  any  function  f  G  BV(l?cZ)  we  have  the  estimate 

aN(f)Ld*(iRd)  —  C(d,(p)N  |  f  |  BV(jRd) )  A  =1,2, ....  (4.7) 

This  result  was  proved  in  [4,  19]  for  functions  in  BV([0,  l]d).  However,  we  can  deduce 
it  for  general  functions  in  B V(lRd)  using  the  following  argument  which  we  will  also  apply 
later  in  similar  settings.  First,  it  is  enough  to  prove  this  result  for  functions  with  compact 
support  since  it  then  follows  for  general  /  by  a  limiting  argument  (see  the  definition  of 
B V(lRd)  given  in  (2.1)  and  Remark  2.1).  Suppose  then  that  /  is  supported  on  Qk  := 
[— 2k~1,  2k~l]d  for  some  k  >  1.  We  consider  the  mapping  g{x)  :=  2k(x  —  e/2)  where 
e  :=  (1, 1, . . . ,  1)  G  Zid .  Then  77,  which  is  composed  of  a  shift  (by  e/2)  and  then  a 
dyadic  dilation  (by  2fc),  maps  [0,  l]d  onto  Qk-  Moreover,  rj  maps  any  dyadic  cube  properly 
contained  in  [0,  l]d  into  a  dyadic  cube  contained  in  Qk.  Now  let  g  :=  /(?/)  and  apply 
the  analogue  of  (4.7)  for  [0,  l)d  to  g.  This  result  gives  a  partition  V  with  f/(V)  <  N 
and  a  function  S  =  Z)_ReP  crXr,  where  the  R  G  V  are  all  of  the  form  R  =  /  —  J  with 
/,  J  dyadic  subcubes  of  [0, 1]/  If  one  of  these  R  has  /  =  [0,  l]d  then  we  can  replace 
this  R  by  at  most  2d  rings  corresponding  to  each  of  the  children  of  [0,  l]d  and  in  this 
way  we  can  assume  that  any  ring  in  V  involves  dyadic  cubes  with  sidelength  <  1.  The 
function  S/g~l)  is  in  T>rcN.  From  the  fact  that  S  approximates  g  in  Ld*  ([0,  l]d)  to  the 
accuracy  C{d^)N^l/d\g\BY([o.i\d)  we  deduce  that  S/g^1)  approximates  /  to  the  accuracy 
C(d,  <^)Af-1/,d|/|BV([0il]d)  (recall  that  the  Ld*  and  BV  norms  scale  the  same  under  dilation. 
This  the  gives  (4.7). 

Let  us  make  one  last  observation  about  approximation  using  the  elements  of  T,rN. 
Given  a  locally  integrable  function  /,  for  each  measurable  set  f 1  C  lRd ,  we  denote  by 
the  average  of  /  over  Q: 


1 

W\ 


J  f(x)  dx. 

n 


Lemma  4.4  If  f  G  BV(iRd),  there  is  a  collectionV  of  disjoint  rings  R,  such  that  #(V)  < 
N  and  the  function 


nN(f)  ■—  X!  fRXR 

Rev 

satisfies 

\\f  —  IZN(f)\\Ldt(lRd)  —  C(<fi,d)N  |  f  |  BV(JRd)  •  (4-8) 
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Proof:  Let  S  G  T>rN  satisfy 

11/  —  S\\Ld*(lRd)  —  d)N  1^d|/|BV(JRd)-  (4-9) 

The  existence  of  such  a  function  is  guaranteed  by  (4.7).  We  can  write  S  =  Y^r&v  crXRi 
where  V  is  a  collection  of  at  most  N  disjoint  rings.  From  the  disjointness  of  the  rings  in 
V,  we  have 

11/  -  K„(/)llfJ,(R.)  =  £  11/  -  +  ll/lll',(P-).  r  '■=  \  (u» R) .  (4.10) 

R&V 


and 


\\f-s\\ 


d* 

Ld*  (Md) 


On  the  other  hand, 


X  11/  -  frIIXcr)  +  II /II  id.  (pc)- 

R&V 


(4.11) 


11/  -  /p||z,d.(fl)  <  2  inf  ||/  -  c||Ld*(,R)  <  2j|/  -  cR||Ld„(R). 

This  follows  from  the  fact  that  the  mapping  /  — >•  fa  is  a  norm  one  projector  on  L<i*(lRd). 
When  this  is  used  in  (4.10),  then  (4.11)  and  (4.9)  prove  (4.8).  □ 


5  Inverse  inequalities 

There  are  certain  inequalities  (called  Bernstein  inequalities)  which  are  companion  to  the 
Jackson  inequalities.  It  was  shown  in  [4]  (for  the  case  d  —  2)  and  [19]  (for  the  case  d  >  2) 
that  any  S  G  T,CN  satisfies 

l*SlBV(K‘i)  —  C^N^-^WSWi^^jRdy  (5.1) 

This  inequality  was  proved  when  S  was  supported  on  [0,  l]rf  in  the  above  references.  If 
S  G  we  can  assume  that  supp  S  C  [—K,K]d  for  some  K  and  by  dilation  and  shifts 
we  can  map  [— K,  K]d  — >  [0,  l]d  and  deduce  the  general  case  (5.1)  from  that  for  [0,  l]d. 

From  (5.1),  it  follows  that  the  same  Bernstein  inequality  holds  when  S  G  T,rN  or 
S  G  Ejy  when  the  wavelet  is  the  Haar  wavelet.  It  will  follow  from  the  results  of  this 
section  that  the  Bernstein  inequality  also  holds  for  Ejy  for  general  compactly  supported 
wavelets.  However,  our  more  general  goal  is  to  prove  a  Bernstein  inequality  for  functions 
that  are  a  sum  of  elements  from  both  Ejy  and  T,CN. 

We  begin  with  a  local  Bernstein  inequality  between  B V(Md)  and  Ld*(JRd)  with  d*  = 
For  any  /  G  we  denote  by  I'  a  general  set  of  the  form  /  \  U2j=1Jj  where  each  Jj 
is  a  (possibly  empty)  subcube  of  the  children  Ij,  j  —  1, . . . ,  2/  of  I. 

Lemma  5.1  For  each  f  G  S).  and  for  each  I  G  Vk  and  any  of  the  sets  I'  we  have 

l/X/'lBV(JRd)  —  d)\\fXl'\\Ld*(lRd). 
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Proof:  First  of  all,  by  dilation  and  translation,  we  can  assume  k  —  0  and  that  /  =  [0,  l]d. 
We  fix  one  of  the  children  Ij  of  /  and  denote  by  /j  :=  Ij\  Jj.  It  is  enough  to  show 

\fXrj\BV{md)  —  co||/X/' llLi(Kd)j  —  !:•••:  2<z.  (5.2) 

2d 

with  c0  a  constant  depending  only  on  </?,  d.  Indeed,  we  have  xr  —  E  X/'  anc^  from  (5.2) 

i= i 


2d  2d 

l/X/'lBV(Kd)  ^  E  l/x  I'j\BV(]Rd)  —  CO  E  H/X/'ll  Li  (JRd) 

j  1  J=1 

=  Coll/X/'IInpjR'4)  ^  C0||/X//||Ld,(Kd), 

where  the  last  inequality  uses  Holder’s  inequality  and  \I'\  <  1. 

To  prove  (5.2),  we  let  J  be  one  of  the  Ij,  fix  J,  and  let  J'  —  J  \  Jj.  The  result  for  the 
other  Ij  will  follow  by  translation.  We  first  observe  that  since  the  space  So  has  dimension 
<  C(<p,d)  on  J,  we  have  (by  equivalence  of  norms  on  a  finite  dimensional  space)  that 

\\fXj\\Loo(]Rd)  <  Cl||/Xj||Ll(Kd),  feS (5‘3) 

and 

\fxj\  B V(Kd)  —  II  /X  J II  BV(Kd)  —  Cl  ||  /X  J II  Li(JRd)  >  /  ^  $0,  (5.4) 

with  ci  depending  only  on  ip  and  d. 

We  consider  two  cases.  The  first  is  that  J'  is  obtained  from  J  by  removing  a  cube 
with  measure  <  5,  where  <5  will  be  specified  in  a  moment.  In  this  case,  we  note  that 


WfxA  L1{Md )  <  II/xjIUi(j')  +  \J\  J'\  •  ||/xjIUoc(j) 

<  ||/Xj|Ui(J')  +Ci|J\  J'\  •  U/Xj||Ll(Kd)-  (5.5) 

Now,  we  select  8  Then,  whenever  |  J\  J'\  <  8,  we  have  ci|  J\  J'\  <  and  therefore 

(5.5)  gives 

II  /Xj  II  Li(lRd)  <  2||/Xj'||z,1(jRd)-  (5-6) 

Next,  we  note  that 

I/xj'I  BV(JRd)  <  |/Xj|BV(JRd)  +  meas(j— l(J  \  J  )||/Xj||L00(Kd) 

<  Ci(l  +  measd_i(J  \  J'))\\fXj\\Ll(iRd) 

—  Ci(l  +  CZd+l)\\fXj\\L1(lRd)  —  2ci(l  +  ^d+1)\\f  XJ'W  L1{lRd)- 

In  the  above  inequalities,  we  have  used  relations  (5.3),  (5.4),  (5.6),  and  the  fact  that 
measf/_ i  ( J  \  J ')  <  2d+1.  Thus,  we  have  proved  (5.2)  in  the  case  |  J  \  J'\  <8. 

To  complete  the  proof,  we  consider  the  case  when  the  dyadic  cube  J\  J'  has  measure 
>  8.  For  each  such  J '  we  have  (by  equivalence  of  norms  on  the  finite  dimensional  space 
iSo  |  j' ,  see  for  comparison  (5.4)) 

II/xj'II  BV(Md )  <  c(J',ip,d)\\fxA  L\(IRd)  ‘ 
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There  is  a  finite  number  of  such  sets  J'  and  therefore  by  enlarging  the  constant  from  the 
first  case  (if  necessary),  we  obtain  (5.2)  for  all  J'  in  the  second  case  as  well.  This  proves 
(5.2)  and  as  noted  earlier  proves  the  Lemma.  □ 

We  shall  utilize  a  construction  given  in  [10].  Let  A  be  any  finite  collection  of  dyadic 
cubes.  Given  /  e  A,  we  define  the  set  B(I )  =  B(I,  A)  of  maximal  cubes  in  /: 

B(I,  A)  :=  { J  E  A  :  J  C  /,  J  ±  I  and  if  ,/'  e  A  with  /  C  I,  J'  ^  I,  J' n  J  then  /  C  J}. 

The  following  Lemma  was  proved  in  [10]. 

Lemma  5.2  If  A  C  D  is  any  finite  collection  of  dyadic  cubes,  then  there  exists  a  set  of 
dyadic  cubes  A  such  that 

(i) AcA  and  #(A)  <  2d#(A), 

(ii)  For  each  cube  /  6  A,  #(B(I,  A))  <  2d,  where  the  B(I,  A)  are  defined  relative  to 

A. 

(iii)  eac/i  c/w/d  of  I  contains  at  most  one  cube  from  B(I,  A). 

Let  us  note  that  in  [10]  this  lemma  was  proved  for  the  case  when  the  cubes  in  A  are 
contained  in  [0,  l]d.  However,  we  can  deduce  the  lemma  as  stated  above  from  this  by 
using  the  following  reasoning,  ft  follows  by  shifts  of  dyadic  cubes  that  the  lemma  is  true 
if  all  of  the  dyadic  cubes  of  A  are  contained  in  a  single  dyadic  cube  of  sidelcngth  one.  In 
the  general  case  given  in  the  above  lemma,  we  can  by  dilating  (if  necessary)  assume  that 
all  dyadic  cubes  in  A  are  contained  in  [—1,  l\d.  We  can  then  partition  A  =  u|=1A j,  where 
A  j,  j  =  1, ,  2d,  is  the  set  of  cubes  in  A  that  are  contained  in  where  Ij  is  one  of  the 
2d  dyadic  cubes  of  sidelength  one  that  make  up  [—1,  l]d.  We  apply  the  lemma  (as  stated 
in  [10])  to  each  A  j  to  receive  A  j.  Then  A  :=  Lb=1Aj  satisfies  the  above  lemma. 

We  introduce  one  final  notation  before  stating  the  main  result  of  this  section.  Given 
a  dyadic  cube  /  e  V,  let 

S(I)  :={J  EV:  \J\  =  |/|,  supp  0}. 

The  cubes  in  S(I )  are  called  the  support  cubes  of  0/.  It  is  clear  that  #(S'(/))  <  C(tp,d) 
because  Lp  has  compact  support. 

The  following  theorem  is  the  main  result  of  this  section.  It  establishes  a  Bernstein 
inequality  for  hybrid  linear  combinations  of  scaling  functions  and  characteristic  functions 
of  dyadic  cubes. 

Theorem  5.3  If  A.i,A.2  C  V  each  has  cardinality  at  most  N  (i.e.  #(A1),#(A2)  <  N), 
then  any  function 

f  —  ai<4>K  +  ^kXk,  (5.7) 

A'eAi  kgA2 

satisfies 


\f\BV{Rd)  <C(<p,d)N'/d\\f\\LMndy 
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Proof:  By  dilating  /  (if  necessary),  we  can  assume  that  each  of  the  functions  4>k  and  xk 
appearing  in  (5.7)  are  supported  in  [—1,  l]d.  (Recall  again  that  the  BV  and  Lj*  norms 
scale  the  same  under  dilation.)  Let  Ij,  j  —  1, . . . ,  2d,  be  the  dyadic  cubes  of  sidelength 
one  that  make  up  [—1,  1]A  We  define 

A  (  U  S(K)\ :j  =  l, ...,2s}. 

\KeAt  ) 

We  now  apply  Lemma  5.2  and  receive  the  set  A  with  #(A)  <  C(cp,d)N. 

For  each  /  G  A  we  now  dehne  I'  :=  /  \  UjeB(/)  A,  where  B(I )  =  B(I,  A).  We  have 
U/gA-F  =  [—1,  l]d  and  the  sets  I'  are  pairwise  disjoint.  Therefore 

f  =  EfXr ■  (5-8) 

ie  A 

Claim:  For  any  I  G  A  with  I  G  T>k,  each  summand  appearing  in  the  representation 
(5.7)  is  in  Sk  on  V . 

To  prove  this  claim,  we  first  consider  any  (px,  K  G  Ai,  appearing  in  the  first  sum.  If 
\K\  >  |/|  then  <px  G  Sk  and  we  have  our  claim  for  this  term.  In  the  case  \K\  <  |/|  let  J 
be  any  support  cube  of  <px  with  J  fl  I  ^  0.  Then  \J\  =  \K\  and  hence  J  is  contained  in 
one  of  the  cubes  of  B(I )  and  hence  J  fl  I'  =  0.  Thus  such  a  ( f>K  is  zero  on  I' .  Thus  we 
have  established  our  claim  for  terms  appearing  in  the  first  summand. 

We  now  consider  an  arbitrary  term  \k  appearing  in  the  second  summand  for  which 
K  fl  I  7^  0.  If  \K\  <  |/|  then  K  is  contained  in  one  of  the  cubes  in  B(I )  which  in 
turn  means  that  \k  is  zero  on  V .  If  \K\  >  |/|  then  (px  is  identically  one  on  V .  Since 
the  constant  functions  are  in  Sk,  we  have  proved  our  claim  for  the  terms  in  the  second 
summand  as  well. 

We  can  now  complete  the  proof  of  the  theorem  by  returning  to  (5.8).  Because  of  the 
claim,  we  can  apply  Lemma  5.1  to  each  term  in  (5.8)  and  thereby  obtain 

\f\BV(Md)  —  ^2  IfXl'l  BV(Md)  <c(v,d)Y,  ||/|| 

Ld*  (Md)  • 

ie A  ie A 

On  the  other  hand,  from  the  Holder  inequality, 

\\fXi'\\Ld*(md)  —  (#(A))  \Y,\\fXi'\\dLd*(Md) 

ie A  \ie A 

=  (#(A))1/1/ILdt(^)  <  CMN1'* WfL^d), 

because  the  sets  I'  G  A  are  disjoint.  □ 

Theorem  5.3  contains  many  Bernstein  inequalities  as  a  special  case.  These  are  sum¬ 
marized  in  the  following  Corollary 


Corollary  5.4  The  Bernstein  inequality 

is  valid  whenever 

(i)  /ge;/gE',/g  ttn. 

(ii)  /  G  Ejy-  ©  Ejy. 
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Proof:  Indeed,  in  each  of  these  situations  /  can  be  rewritten  in  the  form  (5.7)  with  each 
of  the  two  sums  in  (5.7)  having  at  most  C(<p,d)N  terms.  □ 


6  Proof  of  Theorem  1.1 

and  Theorem  1.2  We  can  now  prove  Theorem  1.1.  Given  /  G  BV(lRd),  let  TZ^(f  )  G  T*rN 
be  the  function  in  HrN  satisfying  Lemma  4.4.  We  have 

\Mf)\  BV(Md)  —  I  Gn(J)  ’^•Ar(/)|_BV(JRd)  +  \KN(f)\  BV(Md) 

<  CN'/d\\GM )  -  KN(f)\\ LM#)  +  Ir«(/)Ibv(R')-  (6-1) 

In  the  last  inequality  we  have  used  (ii)  of  Corollary  5.4  for  the  function  (QN(f)  —  72-jv(/))- 

We  estimate  now  the  hrst  term  in  (6.1)  by 

A,1/,il|ftv(/)  -  «»(/)  11^.,^,  <  N'/d  (II g„(f)  - /||tj.(R<)  + 1|/  - ^(nik.,^) 

—  C|/l_BV(JRd))  (6-2) 

where  in  the  last  inequality  we  have  used  (4.2)  and  Lemma  4.2  to  estimate  the  hrst  term 
and  Lemma  4.4  to  estimate  the  second  term.  It  follows  from  Corollary  12  of  [19]  (see  also 
[4]  for  the  case  d  —  2)  that 

l"^Ar(/)l_BV(Kd)  —  C\f\Bv(lRd)-  (6-3) 

Here  we  have  used  our  general  arguments  of  dilation  and  shifts  to  deduce  (6.3).  Using 
the  estimates  (6.2)  and  (6.3)  in  (6.1)  gives  the  desired  estimate.  □ 

Theorem  1.2  can  be  proved  exactly  as  Theorem  12  in  [19]. 


7  Further  discussion 

We  briefly  discuss  some  further  issues  which  will  help  put  our  results  into  perspective 

7.1  The  case  d  =  1 

Theorem  1.1  does  not  hold  in  the  case  d  —  1.  Consider,  for  example,  the  function 
/  =  X [o,i/3]  which  is  in  BV([0, 1]).  We  take  the  Haar  basis  H\,  A  G  A,  normalized  in 
BV([0, 1]):  |77a|bv([o,i])  =  1-  This  is  the  same  as  normalizing  this  basis  in  Loo([0, 1]).  For 
each  dyadic  level  k  =  0, 1, . . .,  there  is  exactly  one  Haar  coefficient  that  is  nonzero  (it 
corresponds  to  the  dyadic  interval  /  G  77/.,  which  contains  1/3).  This  coefficient  C\{f)  has 
absolute  value  1/3  so  that  c\{f)H\(l/3)  =  ±1/3.  For  any  given  N,  we  can  take  N  of  these 
intervals  so  that  all  of  the  numbers  c\(f)H\(l/3)  have  the  same  sign.  Then,  the  function 
f?v(/)  obtained  by  retaining  exactly  these  N  terms  of  the  Haar  expansion  of  /  will  have 
BV([0, 1])  norm  >  N/ 3.  If  one  wants  to  avoid  the  question  of  choosing  arbitrarily  in  the 
case  of  ties,  then  one  can  perturb  these  coefficients  slightly. 
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7.2  Quasi-greedy  bases 

Let  X  be  a  Banach  space  and  {&a}aga  be  a  (Schauder)  basis  for  X  with  ||&a||.y  =  1,  for 
all  A  G  A.  Each  f  E  X  has  a  unique  basis  expansion  /  =  Saga  We  define  the 

greedy  approximant  fhv(/)  as  before: 

6M):=  Y.  Ca(/)6a, 

A6A  n(/) 

where  A at(/)  is  the  set  of  indicies  corresponding  to  the  IV  largest  coefficients  in  absolute 
value  (with  ties  handled  in  an  arbitrary  way). 

The  basis  {fo}  for  the  space  X  is  said  to  be  quasi-greedy  if  \\f  —  GnU) IU  N  — >  oo. 

It  is  known  that  the  Haar  basis  is  not  quasi-greedy  for  L\(lRd )  (see  [13]).  On  the  other 
hand,  it  follows  from  what  we  have  proved  in  this  paper,  that  the  wavelet  bases  are  quasi- 
greedy  in  W1(Li(lRd)).  Indeed,  it  was  proved  in  [18]  that  a  basis  is  quasi-greedy  for  X  if 
and  only  if 


\\GN(f)\\x  <  c\\f\\x,  fex, 

with  C  >  0  an  absolute  constant.  From  Theorem  1.1,  we  know  that  the  wavelet  bases 
satisfy 

\  w1(Li(lRd))  A  C(ip,  d)|/|vC1(Li(JRd))-  (7-1) 

We  want  to  change  from  semi-norm  to  norm  in  (7.1)  which  we  can  accomplish  as  follows. 
Since  the  basis  {^a}aga  is  normalized  in  B X(lRd),  it  follows  that 

Ux\\Ll^)<C2-k,  |  A  |  =  k. 

Secondly,  we  have  the  embedding  W1(Li(JRd))  C  Bl0(Li(JRd))  and 

\\f\\Bl0(L1(]Rd))  A  C'(cOI|/|lw1(Li(.Rd))’ 

where  5(^0(L1(iRd))  is  the  Besov  space  whose  norm  is  given  by 

\\f\\BULi(Md))  :=SUP  5]  M/)|- 

k> 0 AGCfc 

Therefore,  taking  any  index  set  A  (not  necessarily  a  greedy  selection),  we  have 

OO 

\\^2  cx(f)rlP\\\L1{lRd)  —  ^2^  k  Ica(/)|  <  \\f\\w1{L1(Md))- 

agA  k= o  AGAnxy 

Hence,  we  can  add  the  L1(iRd)- norm  of  Gn{I)  to  the  left  side  of  (7.1)  and  replace  the 
Wl(Li(IRd))  semi- norm  of  /  by  the  Wl(Li(IRd))  norm  and  obtain  that 

IIM/)II  W1(L1{Md))  —  C(</?,  d )  H/llwqL  i(JRd))' 
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7.3  Thresholding 

Continuing  with  our  setting  of  a  wavelet  basis  {t/a/asa  normalized  for  BV(iRd),  for  each 
e  >  0,  we  define  the  hard  thresholding  operator 

T,(f):=  £  cj(/)V>a, 

>€*(/,<) 

where  A (/,  e)  :=  {A  :  \c\(f)\  >  e}.  It  follows  from  Theorem  1.1  that  this  operator  is 
bounded  on  BV(iRrf): 

l^e(/)lBV(JRd)  —  C{(Pi  d)|/|Bv(Kdp  /  ^  BV(iR(i). 

There  is  another  version  of  thresholding  (called  soft  thresholding )  which  is  preferred  in 
some  problems  of  statistical  optimization.  To  describe  soft  thresholding,  we  fix  a  function 
77(f)  defined  on  [0,  00)  such  that  77  is  increasing  and 

0  <  77(f)  <  1  for  all  f, 

77(f)  =  0  for  0  <  f  <  1/2, 

77(f)  =  1  for  f  >  1. 

Given  e  >  0  and  /  €  BV(lRd),  we  define  the  soft  thresholding  operator  Tf  by 

V(!)  ■=  £ 

AeA 

Claim:  For  eac/j  e  >  0,  we  /lave 

l^e?(/)lw(Kd)  —  C{tp,  d)\f\Bv(md)i  /  ^  BV(lRd). 

Proof  of  Claim:  We  order  the  coefficients  C\  :=  c\(f)  of  /  in  decreasing  order  as 
| cai I  >  |ca2|  >  ....  We  fix  integers  Na  <  N\  <  ...  <  Ns  and  numbers  1  =:  /5o  >  Pi  > 
■■■>  Ps>  Ps+1  ■  =  1/2  in  such  a  way  that 


|cAJ  >  e  =  /50e 

for 

3  <  No, 

M  <  e/2  =  ps+ie 

for 

J  >  Na, 

|ca,|  =  Pi+ie 

for 

Ni<j<  Ni+ 1,  7  =  0,.. 

.,s-l 

One  checks  that 

t?(/)  =  £  Wft)  -  i?(ft+i)]e«,(/), 

i= 0 

so  from  the  triangle  inequality  we  get 

lT??(/)lBV(JRd)  —  \v(Pi)  V(Pi+l )]  \  @Ni  (/)  lBV(JRd) 

v=  0 

<  C((p,d)\f\BY{]Rd)  [v(Pi)  ~  V(Pi+ 1)] 

i= 0 

=  C(tp,  d)|/|BV(Kd). 

Note  that  the  above  claim  and  its  proof,  although  stated  for  the  space  B X(lRd),  hold 
for  any  Banach  space  X  (used  in  place  of  B X(lRd))  which  has  a  quasi-greedy  basis  (used 
in  place  of 
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7.4  The  case  of  domains  Q  C  lRd 

Versions  of  Theorem  1.1  remain  valid  for  BV(H)  with  H  certain  domains  in  lRd .  We  briefly 
mention  two  of  the  typical  settings. 

For  certain  domains  11  C  lRd,  one  can  construct  wavelet  bases  {0a}aga  such  that 
supp(,0a)  C  11  for  each  A  G  A.  The  -0a  whose  support  is  sufficiently  inside  the  interior 
of  the  domain  are  the  usual  wavelets  on  Md.  Near  the  boundary,  the  have  a  different 
structure.  The  first  examples  of  such  constructions  were  made  in  [3]  for  an  interval  on 
1R.  These  constructions  were  then  extended  to  certain  multidimensional  domains  (such  as 
polyhedral  domains)  (see  [7])  and  then  ultimately  to  quite  general  domains  in  [1],  These 
constructed  bases  have  the  three  main  properties  we  need  to  prove  Theorem  1.1.  They 
are  of  compact  support.  The  scaling  functions  on  a  given  dyadic  level  form  a  partition  of 
unity.  The  scaling  functions  and  wavelets  on  a  dyadic  level  k  can  be  written  as  a  linear 
combination  of  a  fixed  number  of  scaling  functions  at  level  k  +  1.  Thus,  an  analogue  of 
Theorem  1.1  is  valid  for  such  basis  where  now  the  BV(fRd)  norm  is  replaced  by  the  BV(H) 
norm. 

The  second  setting  applies  to  quite  general  domains  H  C  lRd .  For  example,  it  is 
sufficient  that  H  is  a  Lipschitz  graph  domain  (a  minimally  smooth  domain  in  the  sense 
of  Stein  (see  [161,  p.180)).  Any  function  in  BV(H)  can  be  extended  to  a  function  Ef  in 
B V{Md)  satisfying 


\\Ef\\  BV(Md)  A  C'(n)||/||Bv(n). 

Such  extension  theorems  are  typically  proved  for  the  space  Wl{L i(H))  and  then  follow 
for  BV(H)  by  a  limiting  argument.  We  can  expand  Ef  in  a  wavelet  expansion 

Ef=J2  ca(^/)0a- 

aga 

This  decompositions  serves  as  a  wavelet  representation  for  /  on  11: 

f  =  ca(£/)0a, 

agA(Q) 

where  A (11)  is  the  set  of  all  indicies  A  G  A  for  which  does  not  vanish  identically  on  H. 

Consider  now  the  thresholding  operator  Te  applied  to  /  and  Ef.  Since  Te(f)  =  Te(Ef) 
on  H,  we  deduce  that 

\T,(f) Ibv(«)  =  \T,(Ef)\Bvm  <  |T«(B/)|BV(Eil 

<  C(<p,  oO|-E,/|BV(2Rd)  A  C((p,  d,  H)||/||bv(u)- 

In  general,  we  cannot  replace  the  norm  on  the  right  by  the  semi-norm. 
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